Protein sequence redundancy reduction: comparison of various method
نویسندگان
چکیده
Non-redundant protein datasets are of utmost importance in bioinformatics. Constructing such datasets means removing protein sequences that overreach certain similarity thresholds. Several programs such as 'Decrease redundancy', 'cd-hit', 'Pisces', 'BlastClust' and 'SkipRedundant' are available. The issue that we focus on here is to what extent the non-redundant datasets produced by different programs are similar to each other. A systematic comparison of the features and of the outputs of these programs, by using subsets of the UniProt database, was performed and is described here. The results show high level of overlap between non-redundant datasets obtained with the same program fed with the same initial dataset but different percentage of identity threshold, and moderate levels of similarity between results obtained with different programs fed with the same initial dataset and the same percentage of identity threshold. We must be aware that some differences may arise and the use of more than one computer application is advisable.
منابع مشابه
Study on Genetic Diversity of Terminal Fragment Sequence of Isolated Persian Tobacco Mosaic Virus
Tobacco mosaic virus (TMV) is one of the devastating plant viruses in the world that infects more than 200 plant species. Movement protein plays a supportive role in the movement of other plant viruses, and viral coat protein is highly expressed in infected plants and affects replication and movements of TMV. In order to investigate genetic variation in the terminal fragment sequence in Iranian...
متن کاملStability Assessment of the Flexible System using Redundancy
In this study, dynamic behavior of a mooring line in a floating system is analyzed by probability approaches. In dynamics, most researches have shown the system model and environments by mathematical expression. We called this process as the forward dynamics. However, there is a limit to define the exact environments because of uncertainty. To consider uncertainty, we introduce the redundancy i...
متن کاملDengue virus type-3 envelope protein domain III; expression and immunogenicity
Objective(s): Production of a recombinant and immunogenic antigen using dengue virus type-3 envelope protein is a key point in dengue vaccine development and diagnostic researches. The goals of this study were providing a recombinant protein from dengue virus type-3 envelope protein and evaluation of its immunogenicity in mice. Materials and Methods: Multiple amino acid sequences of different i...
متن کاملA comprehensive experimental comparison of the aggregation techniques for face recognition
In face recognition, one of the most important problems to tackle is a large amount of data and the redundancy of information contained in facial images. There are numerous approaches attempting to reduce this redundancy. One of them is information aggregation based on the results of classifiers built on selected facial areas being the most salient regions from the point of view of classificati...
متن کاملRedundancy allocation problem for k-out-of-n systems with a choice of redundancy strategies
To increase the reliability of a specific system, using redundant components is a common method which is called redundancy allocation problem (RAP). Some of the RAP studies have focused on k-out-of-n systems. However, all of these studies assumed predetermined active or standby strategies for each subsystem. In this paper, for the first time, we propose a k-out-of-<em...
متن کامل